A Comparison of Approaches for Sentiment Classification on Lithuanian Internet Comments

نویسندگان

  • Jurgita Kapociute-Dzikiene
  • Algis Krupavicius
  • Tomas Krilavicius
چکیده

Despite many methods that effectively solve sentiment classification task for such widely used languages as English, there is no clear answer which methods are the most suitable for the languages that are substantially different. In this paper we attempt to solve Internet comments sentiment classification task for Lithuanian, using two classification approaches – knowledge-based and supervised machine learning. We explore an influence of sentiment word dictionaries based on the different parts-of-speech (adjectives, adverbs, nouns, and verbs) for knowledge-based method; different feature types (bag-ofwords, lemmas, word n-grams, character n-grams) for machine learning methods; and pre-processing techniques (emoticons replacement with sentiment words, diacritics replacement, etc.) for both approaches. Despite that supervised machine learning methods (Support Vector Machine and Naı̈ve Bayes Multinomial) significantly outperform proposed knowledge-based method all obtained results are above baseline. The best accuracy 0.679 was achieved with Naı̈ve Bayes Multinomial and token unigrams plus bigrams, when pre-processing involved diacritics replacement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

Sentiment analysis methods in Sentiment analysis methods in Persian text: A survey

With the explosive growth of social media such as Twitter, reviews on e-commerce website, and comments on news websites, individuals and organizations are increasingly using opinions in these media for their decision making. Sentiment analysis is one of the techniques used to analyze userschr('39') opinions in recent years. Persian language has specific features and thereby requires unique meth...

متن کامل

Text Sentiment Classification Based on Mixed Cloud Vector Model Clustering and Kernel Fisher Discriminant

In today’s world, the web has dramatically changed the way that people express their opinions. People use the internet to express their opinion, attitude, feeling and emotion about films, goods, news etc. It is challenging to automatically classify mass subjectivity comments into different sentiment orientation categories (e.g. positive/negative). Furthermore, the ambiguity and randomness, whic...

متن کامل

Efficient Method Based on Combination of Deep Learning Models for Sentiment Analysis of Text

People's opinions about a specific concept are considered as one of the most important textual data that are available on the web. However, finding and monitoring web pages containing these comments and extracting valuable information from them is very difficult. In this regard, developing automatic sentiment analysis systems that can extract opinions and express their intellectual process has ...

متن کامل

CUD: Crowdsourcing for URL Spam Detection

The prevalence of spam URLs in Internet services, such as email, social networks, blogs and online forums has become a serious problem. These spam URLs host spam advertisements, phishing attempts, and malwares, which are harmful for normal users. Existing URL blacklist approaches offer limited protection. Although recent machine learning based URL classification approaches demonstrate good accu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013